Geospatial Analysis

Author

Wan Kee

Published

February 22, 2024

Modified

March 11, 2024

1. Learning Objective

A choropleth map is a thematic map composed of coloured polygons and is used to represent statistical data using the color mapping symbology technique.

Choropleth mapping involves the symbolisation of enumeration units, such as countries, provinces, states, counties or census units, using area patterns or graduated colors. For example, a social scientist may need to use a choropleth map to portray the spatial distribution of aged population of Singapore by Master Plan 2014 Subzone Boundary.

  • Plot choropleth maps using tmap package

2. Import Packages

pacman::p_load(tidyverse, sf, tmap, psych)

3. Load Data

mpsz is the Master Plan 2014 Subzone Boundary in ESRI shapefile format. It can be downloaded at data.gov.sg This is a geospatial data. It consists of the geographical boundary of Singapore at the planning subzone level. The data is based on URA Master Plan 2014. st_read() function of sf package to import the MP14_SUBZONE_WEB_PL shapefile into R as a simple feature data frame.

mpsz <- st_read(dsn = "data/geospatial", 
                layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `/Users/chockwankee/Documents/chockwk/ISSS608_Visual Analytics/Topics/HO10/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
mpsz
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
First 10 features:
   OBJECTID SUBZONE_NO       SUBZONE_N SUBZONE_C CA_IND      PLN_AREA_N
1         1          1    MARINA SOUTH    MSSZ01      Y    MARINA SOUTH
2         2          1    PEARL'S HILL    OTSZ01      Y          OUTRAM
3         3          3       BOAT QUAY    SRSZ03      Y SINGAPORE RIVER
4         4          8  HENDERSON HILL    BMSZ08      N     BUKIT MERAH
5         5          3         REDHILL    BMSZ03      N     BUKIT MERAH
6         6          7  ALEXANDRA HILL    BMSZ07      N     BUKIT MERAH
7         7          9   BUKIT HO SWEE    BMSZ09      N     BUKIT MERAH
8         8          2     CLARKE QUAY    SRSZ02      Y SINGAPORE RIVER
9         9         13 PASIR PANJANG 1    QTSZ13      N      QUEENSTOWN
10       10          7       QUEENSWAY    QTSZ07      N      QUEENSTOWN
   PLN_AREA_C       REGION_N REGION_C          INC_CRC FMEL_UPD_D   X_ADDR
1          MS CENTRAL REGION       CR 5ED7EB253F99252E 2014-12-05 31595.84
2          OT CENTRAL REGION       CR 8C7149B9EB32EEFC 2014-12-05 28679.06
3          SR CENTRAL REGION       CR C35FEFF02B13E0E5 2014-12-05 29654.96
4          BM CENTRAL REGION       CR 3775D82C5DDBEFBD 2014-12-05 26782.83
5          BM CENTRAL REGION       CR 85D9ABEF0A40678F 2014-12-05 26201.96
6          BM CENTRAL REGION       CR 9D286521EF5E3B59 2014-12-05 25358.82
7          BM CENTRAL REGION       CR 7839A8577144EFE2 2014-12-05 27680.06
8          SR CENTRAL REGION       CR 48661DC0FBA09F7A 2014-12-05 29253.21
9          QT CENTRAL REGION       CR 1F721290C421BFAB 2014-12-05 22077.34
10         QT CENTRAL REGION       CR 3580D2AFFBEE914C 2014-12-05 24168.31
     Y_ADDR SHAPE_Leng SHAPE_Area                       geometry
1  29220.19   5267.381  1630379.3 MULTIPOLYGON (((31495.56 30...
2  29782.05   3506.107   559816.2 MULTIPOLYGON (((29092.28 30...
3  29974.66   1740.926   160807.5 MULTIPOLYGON (((29932.33 29...
4  29933.77   3313.625   595428.9 MULTIPOLYGON (((27131.28 30...
5  30005.70   2825.594   387429.4 MULTIPOLYGON (((26451.03 30...
6  29991.38   4428.913  1030378.8 MULTIPOLYGON (((25899.7 297...
7  30230.86   3275.312   551732.0 MULTIPOLYGON (((27746.95 30...
8  30222.86   2208.619   290184.7 MULTIPOLYGON (((29351.26 29...
9  29893.78   6571.323  1084792.3 MULTIPOLYGON (((20996.49 30...
10 30104.18   3454.239   631644.3 MULTIPOLYGON (((24472.11 29...

pop is the Singapore Residents by Planning Area/Subzone, Age Group, Sex and Type of Dwelling, June 2011-2020 in csv format (i.e. respopagesextod2011to2020.csv). This is an aspatial data fie. It can be downloaded at Department of Statistics, Singapore Although it does not contain any coordinates values, but it’s PA and SZ fields can be used as unique identifiers to geocode to MP14_SUBZONE_WEB_PL shapefile.

pop <- read_csv("data/aspatial/respopagesextod2011to2020.csv")
glimpse(pop)
Rows: 984,656
Columns: 7
$ PA   <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo K…
$ SZ   <chr> "Ang Mo Kio Town Centre", "Ang Mo Kio Town Centre", "Ang Mo Kio T…
$ AG   <chr> "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to_4", "0_to…
$ Sex  <chr> "Males", "Males", "Males", "Males", "Males", "Males", "Males", "M…
$ TOD  <chr> "HDB 1- and 2-Room Flats", "HDB 3-Room Flats", "HDB 4-Room Flats"…
$ Pop  <dbl> 0, 10, 30, 50, 0, 0, 40, 0, 0, 10, 30, 60, 0, 0, 40, 0, 0, 10, 30…
$ Time <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,…

4. Prepare Data

The following attributes will be derived:

YOUNG: Age group 0_to_4 and 20_to_24 ECONOMY ACTIVE: Age group 25_to_29 and 60_to_64 AGED: Age group 65_and_above TOTAL: All age group DEPENDENCY: Ratio between YOUNG and AGED : ECONOMIC ACTIVE

pop2020 <- pop %>%
  filter(Time == 2020) %>%
  group_by(PA, SZ, AG) %>%
  summarise(POP = sum(Pop)) %>%
  ungroup() %>%
  pivot_wider(names_from = AG, values_from = POP) %>%
  mutate(YOUNG = rowSums(.[3:6]) + rowSums(.[12])) %>%
  mutate(ECONOMY_ACTIVE = rowSums(.[7:11]) + rowSums(.[13:15]))%>%
  mutate(AGED = rowSums(.[16:21])) %>%
  mutate(TOTAL = rowSums(.[3:21])) %>%
  mutate(DEPENDENCY = (YOUNG + AGED)/ECONOMY_ACTIVE) %>%
  select(PA, SZ, YOUNG, ECONOMY_ACTIVE, AGED, TOTAL, DEPENDENCY)
glimpse(pop2020)
Rows: 332
Columns: 7
$ PA             <chr> "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio", "Ang Mo Kio",…
$ SZ             <chr> "Ang Mo Kio Town Centre", "Cheng San", "Chong Boon", "K…
$ YOUNG          <dbl> 1440, 6640, 6150, 5540, 2100, 3960, 2220, 4690, 0, 1220…
$ ECONOMY_ACTIVE <dbl> 2610, 15460, 13950, 12090, 3410, 8420, 4200, 11450, 0, …
$ AGED           <dbl> 760, 6050, 6470, 5120, 1310, 3610, 1530, 5100, 0, 750, …
$ TOTAL          <dbl> 4810, 28150, 26570, 22750, 6820, 15990, 7950, 21240, 0,…
$ DEPENDENCY     <dbl> 0.8429119, 0.8208279, 0.9046595, 0.8817204, 1.0000000, …
describe(pop2020)
               vars   n     mean       sd  median trimmed     mad  min    max
PA*               1 332    26.75    16.65   27.50   26.49   20.02 1.00     55
SZ*               2 332   166.50    95.98  166.50  166.50  123.06 1.00    332
YOUNG             3 332  3454.13  5249.71 1265.00 2322.26 1875.49 0.00  36950
ECONOMY_ACTIVE    4 332  6892.23 10198.28 2315.00 4770.30 3432.22 0.00  73950
AGED              5 332  1856.96  2626.05  695.00 1348.20 1030.41 0.00  20240
TOTAL             6 332 12203.31 17814.72 4330.00 8574.62 6419.66 0.00 131140
DEPENDENCY        7 234     0.86     1.20    0.79    0.79    0.12 0.11     19
                   range  skew kurtosis     se
PA*                54.00  0.09    -1.28   0.91
SZ*               331.00  0.00    -1.21   5.27
YOUNG           36950.00  2.53     8.87 288.12
ECONOMY_ACTIVE  73950.00  2.40     8.03 559.70
AGED            20240.00  2.45     9.87 144.12
TOTAL          131140.00  2.38     8.22 977.71
DEPENDENCY         18.89 14.62   217.36   0.08

PA and SZ of pop2020 and SUBZONE_N and PLN_AREA_N of mpszare standardized to uppercase.

pop2020 <- pop2020 %>%
  mutate_at(.vars = vars(PA, SZ), .funs = funs(toupper)) %>%
  filter(ECONOMY_ACTIVE > 0)
describe(pop2020)
               vars   n     mean       sd   median  trimmed      mad   min
PA*               1 234    19.82    12.88    19.50    19.46    17.05  1.00
SZ*               2 234   117.50    67.69   117.50   117.50    86.73  1.00
YOUNG             3 234  4900.73  5659.61  2895.00  3883.62  3832.52  0.00
ECONOMY_ACTIVE    4 234  9778.72 10927.13  5995.00  7872.39  7694.69 10.00
AGED              5 234  2634.66  2781.89  1920.00  2218.94  2364.75  0.00
TOTAL             6 234 17314.10 19025.75 11050.00 14115.43 14106.94 50.00
DEPENDENCY        7 234     0.86     1.20     0.79     0.79     0.12  0.11
                  max     range  skew kurtosis      se
PA*                42     41.00  0.16    -1.29    0.84
SZ*               234    233.00  0.00    -1.22    4.43
YOUNG           36950  36950.00  2.18     6.79  369.98
ECONOMY_ACTIVE  73950  73940.00  2.06     6.20  714.33
AGED            20240  20240.00  2.19     8.49  181.86
TOTAL          131140 131090.00  2.06     6.51 1243.75
DEPENDENCY         19     18.89 14.62   217.36    0.08

pop2020 and mpsz are joined using SZ and SUBZONE_N.

mpszpop2020 <- left_join(mpsz, pop2020,
                         by = c("SUBZONE_N" = "SZ"))
glimpse(mpszpop2020)
Rows: 323
Columns: 22
$ OBJECTID       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
$ SUBZONE_NO     <int> 1, 1, 3, 8, 3, 7, 9, 2, 13, 7, 12, 6, 1, 5, 1, 1, 3, 2,…
$ SUBZONE_N      <chr> "MARINA SOUTH", "PEARL'S HILL", "BOAT QUAY", "HENDERSON…
$ SUBZONE_C      <chr> "MSSZ01", "OTSZ01", "SRSZ03", "BMSZ08", "BMSZ03", "BMSZ…
$ CA_IND         <chr> "Y", "Y", "Y", "N", "N", "N", "N", "Y", "N", "N", "N", …
$ PLN_AREA_N     <chr> "MARINA SOUTH", "OUTRAM", "SINGAPORE RIVER", "BUKIT MER…
$ PLN_AREA_C     <chr> "MS", "OT", "SR", "BM", "BM", "BM", "BM", "SR", "QT", "…
$ REGION_N       <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGION", "…
$ REGION_C       <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "…
$ INC_CRC        <chr> "5ED7EB253F99252E", "8C7149B9EB32EEFC", "C35FEFF02B13E0…
$ FMEL_UPD_D     <date> 2014-12-05, 2014-12-05, 2014-12-05, 2014-12-05, 2014-1…
$ X_ADDR         <dbl> 31595.84, 28679.06, 29654.96, 26782.83, 26201.96, 25358…
$ Y_ADDR         <dbl> 29220.19, 29782.05, 29974.66, 29933.77, 30005.70, 29991…
$ SHAPE_Leng     <dbl> 5267.381, 3506.107, 1740.926, 3313.625, 2825.594, 4428.…
$ SHAPE_Area     <dbl> 1630379.27, 559816.25, 160807.50, 595428.89, 387429.44,…
$ PA             <chr> NA, "OUTRAM", "SINGAPORE RIVER", "BUKIT MERAH", "BUKIT …
$ YOUNG          <dbl> NA, 1200, 0, 3150, 2900, 3340, 3130, 0, 1290, 50, NA, 7…
$ ECONOMY_ACTIVE <dbl> NA, 2860, 40, 6900, 6020, 6800, 7700, 50, 2600, 140, NA…
$ AGED           <dbl> NA, 2120, 10, 3320, 1740, 3420, 3610, 10, 610, 60, NA, …
$ TOTAL          <dbl> NA, 6180, 50, 13370, 10660, 13560, 14440, 60, 4500, 250…
$ DEPENDENCY     <dbl> NA, 1.1608392, 0.2500000, 0.9376812, 0.7707641, 0.99411…
$ geometry       <MULTIPOLYGON [m]> MULTIPOLYGON (((31495.56 30..., MULTIPOLYG…
write_rds(mpszpop2020, "data/mpszpop2020.rds")

5. Choropleth Map

mpszpop2020 <- readRDS("data/mpszpop2020.rds")

qtm()

qtm() plots a quick thematic map and is a convenient wrapper of the main plotting method. The fill argument is either a colour or a data variable to draw a choropleth. The disadvantge of qtm() makes aesthetics of individual layers harder to control.

tmap_mode("plot")
qtm(mpszpop2020, fill = "DEPENDENCY")

tmap()

The basic building block of tmap is tm_shape() followed by the layer elements tm_shape() to define the input data and tm_polygons() to draw the planning subzone polygons.

Note that tm_polygons() is a wrapper of tm_fill() and tm_border(). tm_fill() shades the polygons by using the default colour scheme and tm_borders() adds the borders of the shapefile onto the choropleth map where the alpha argument is used to define transparency number between 0 (totally transparent) and 1 (default; not transparent).

tmap provides a total ten data classification methods, namely: fixed, sd, equal, pretty, quantile, kmeans, hclust, bclust, fisher, and jenks. The default interval binning used to draw the choropleth map is pretty and the default colour scheme is YlOrRd of ColorBrewer. By default, Missing value will be shaded in grey.

The pretty style chooses a number of breaks not necessarily equal to n using pretty, but likely to be legible.

tm_shape(mpszpop2020)+
  tm_polygons("DEPENDENCY")

The quantile style provides quantile breaks.

tm_shape(mpszpop2020)+
  tm_fill("DEPENDENCY",
          n = 5,
          style = "quantile") +
  tm_borders(alpha = 0.5)

The kmeans style uses kmeans to generate the breaks; it may be anchored using set.seed.

tm_shape(mpszpop2020)+
  tm_fill("DEPENDENCY",
          n = 5,
          style = "kmeans") +
  tm_borders(alpha = 0.5)

The hclust style uses hclust to generate the breaks using hierarchical clustering. The bclust style uses bclust to generate the breaks using bagged clustering.

tm_shape(mpszpop2020)+
  tm_fill("DEPENDENCY",
          n = 5,
          style = "bclust") +
  tm_borders(alpha = 0.5)

Committee Member: 1(1) 2(1) 3(1) 4(1) 5(1) 6(1) 7(1) 8(1) 9(1) 10(1)
Computing Hierarchical Clustering

The fisher style uses the algorithm proposed by W. D. Fisher (1958) and discussed by Slocum et al. (2005) as the Fisher-Jenks algorithm. This style will subsample by default for more than 3000 observations.

tm_shape(mpszpop2020)+
  tm_fill("DEPENDENCY",
          n = 5,
          style = "fisher") +
  tm_borders(alpha = 0.5)

jenks is a quantile data classification that used 5 classes. The jenks style has been ported from Jenks’ code, and has been checked for consistency with ArcView, ArcGIS, and MapInfo

tm_shape(mpszpop2020)+
  tm_fill("DEPENDENCY",
          n = 5,
          style = "jenks") +
  tm_borders(alpha = 0.5)

Breaks

Breaks are computed internally or set explicitly by means of the breaks argument to the tm_fill(). It is a good practice to obtain descriptive statistics on the variable before setting the break points.

summary(mpszpop2020$DEPENDENCY)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 0.1111  0.7147  0.7866  0.8585  0.8763 19.0000      92 

Based on the output, the break points are set at 0.60, 0.70, 0.80, and 0.90. In addition, a minimum and maximum will also be set at 0 and 100. Therefore, the breaks vector is c(0, 0.60, 0.70, 0.80, 0.90, 1.00).

tm_shape(mpszpop2020)+
  tm_fill("DEPENDENCY",
          breaks = c(0, 0.60, 0.70, 0.80, 0.90, 1.00)) +
  tm_borders(alpha = 0.5)

tm_shape(mpszpop2020)+
  tm_fill("DEPENDENCY", 
          style = "quantile", 
          palette = "Blues",
          title = "Dependency ratio") +
  tm_layout(main.title = "Distribution of Dependency Ratio by planning subzone",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from Urban Redevelopment Authorithy (URA)\n and Population data from Department of Statistics DOS", 
             position = c("left", "bottom"))

Multiple Choropleth Plots

tm_shape(mpszpop2020)+ 
  tm_polygons(c("YOUNG","AGED"),
          style = c("pretty", "pretty"), 
          palette = list("Blues","Purples")) +
  tm_layout(legend.position = c("right", "bottom"))

tm_shape(mpszpop2020)+ 
  tm_polygons(c("YOUNG","AGED"),
          style = c("quantile", "quantile"), 
          palette = list("Blues","Purples")) +
  tm_layout(legend.position = c("right", "bottom"))

Group-by Variable

tm_shape(mpszpop2020) +
  tm_fill("DEPENDENCY",
          style = "quantile",
          palette = "Blues",
          thres.poly = 0) + 
  tm_facets(by="REGION_N", 
            free.coords=TRUE, 
            drop.shapes=FALSE) +
  tm_layout(legend.show = FALSE,
            title.position = c("center", "center"), 
            title.size = 20) +
  tm_borders(alpha = 0.5)

tm_shape(mpszpop2020[mpszpop2020$REGION_N=="CENTRAL REGION", ])+
  tm_fill("DEPENDENCY", 
          style = "quantile", 
          palette = "Blues", 
          legend.hist = TRUE, 
          legend.is.portrait = TRUE,
          legend.hist.z = 0.1) +
  tm_layout(legend.outside = TRUE,
            legend.height = 0.45, 
            legend.width = 5.0,
            legend.position = c("right", "bottom"),
            frame = FALSE) +
  tm_borders(alpha = 0.5)

6. Geospatial Data Point

Visualizing geospatial point data typically involves plotting points on a map to represent locations or events. Using the mapping library to create a map and add the data points as layers, the map appearance can be customized to enhance visualization, such as background, colors, and labels.

Proportional symbol maps (also known as graduate symbol maps) are a class of maps that use the visual variable of size to represent differences in the magnitude of a discrete, abruptly changing phenomenon.

7. Load Data

SGPools

SGPools contains the locations of Singapore outlets and branches and their respective winnings.

sgpools <- read_csv("data/aspatial/SGPools_svy21.csv")
glimpse(sgpools)
Rows: 306
Columns: 7
$ NAME              <chr> "Livewire (Marina Bay Sands)", "Livewire (Resorts Wo…
$ ADDRESS           <chr> "2 Bayfront Avenue, #01-01 The Shoppes at Marina Bay…
$ POSTCODE          <dbl> 18972, 98138, 738078, 188306, 552542, 731001, 370064…
$ XCOORD            <dbl> 30841.56, 26703.87, 20117.93, 29776.95, 32238.69, 21…
$ YCOORD            <dbl> 29598.56, 26525.70, 44888.06, 31382.18, 39518.76, 46…
$ `OUTLET TYPE`     <chr> "Branch", "Branch", "Branch", "Branch", "Branch", "B…
$ `Gp1Gp2 Winnings` <dbl> 5, 11, 0, 44, 0, 3, 17, 16, 21, 25, 22, 25, 31, 27, …

8. Prepare Data

st_as_sf() converts sgpools data frame into a simple feature data frame. The coords argument defines the x-coordinates and y-coordinates. The crs argument specifies the coordinates system in epsg format where EPSG: 3414 is Singapore SVY21 Projected Coordinate System.

sgpools_sf <- st_as_sf(sgpools, 
                       coords = c("XCOORD", "YCOORD"),
                       crs= 3414)

9. Interactive Geospatial Point Map

tm_bubbles() adds bubble symbols to the map and are often used to represent point data.

The size argument specifies the attribute used to determine the size of the bubbles. It can take on a numeric value as a standard size or a numeric attribute where the size will be proportional to the values of the “Gp1Gp2 Winnings” attribute.

The col argument specifies the color of the bubbles. It can be a colour value or a categorical attribute.

tmap_mode("view")

tm_shape(sgpools_sf) +
  tm_bubbles(col = "OUTLET TYPE",
             size = "Gp1Gp2 Winnings",
             border.col = "black",
             border.lwd = 1) +
  tm_view(set.zoom.limits = c(11,12))

tm_facets() divides the map into multiple facets based on the specified attribute. Each unique value of the attribute will result in a separate facet.

The by argument specifies the attribute used to create facets. Each unique value of the “OUTLET TYPE” attribute will result in a separate facet.

The nrow argument specifies the number of rows in the facet grid. Where nrow = 1, all facets will be arranged in a single row.

The sync argument specifies whether the scales of the facets should be synchronized. When set to TRUE, the scales (e.g., zoom levels) of all facets will be synchronized, allowing for easy comparison between facets.

tm_shape(sgpools_sf) +
  tm_bubbles(col = "OUTLET TYPE", 
          size = "Gp1Gp2 Winnings",
          border.col = "black",
          border.lwd = 1) +
  tm_facets(by= "OUTLET TYPE",
            nrow = 1,
            sync = TRUE) +
  tm_view(set.zoom.limits = c(11,12))

10. Geospatial Analytical Map

Geospatial analytical maps involve the visualization and analysis of spatial data to reveal patterns, trends, and relationships within a geographic context. These maps are designed to provide insights into spatial distributions and variations, allowing for better understanding and interpretation of data. Some common types of geospatial analytical maps include choropleth maps, rate maps, percentile maps, and boxmaps.

11. Import Data

NGA_wp is a polygon feature data frame providing information on water point of Nigeria at the LGA level.

NGA_wp <- read_rds("data/rds/NGA_wp.rds")

12. Choropleth Maps

Choropleth maps use color shading to represent variations in data values across geographic regions. Each region is shaded based on the value of a specific variable, such as population density, average income, or disease prevalence. Choropleth maps are useful for visualizing spatial patterns and disparities in data.

tmap_mode("view")

p1 <- tm_shape(NGA_wp) +
  tm_fill("wp_functional",
          n = 10,
          style = "equal",
          palette = "Blues") +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Distribution of functional water point by LGAs",
            legend.outside = FALSE)

p2 <- tm_shape(NGA_wp) +
  tm_fill("total_wp",
          n = 10,
          style = "equal",
          palette = "Blues") +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Distribution of total water point by LGAs",
            legend.outside = FALSE)

tmap_arrange(p2, p1, nrow = 1)

13. Rate Maps

Rate maps are a type of choropleth map that display rates or proportions per unit area. They are commonly used to visualize rates of occurrence or density of phenomena, such as crime rates, disease incidence, or population density. Rate maps often involve normalizing raw counts by a population or area size to allow for meaningful comparisons between different regions.

mutate() from dplyr package is used to derive two fields, namely pct_functional and pct_nonfunctional.

NGA_wp <- NGA_wp %>%
  mutate(pct_functional = wp_functional/total_wp) %>%
  mutate(pct_nonfunctional = wp_nonfunctional/total_wp)

tm_shape(NGA_wp) +
  tm_fill("pct_functional",
          n = 10,
          style = "equal",
          palette = "Blues",
          legend.hist = TRUE) +
  tm_borders(lwd = 0.1,
             alpha = 1) +
  tm_layout(main.title = "Rate map of functional water point by LGAs",
            legend.outside = TRUE)

14. Percentile Maps

Percentile maps divide the data into equal intervals based on percentiles and assign each interval a unique color or shading. This type of map is useful for identifying relative rankings or distributions of values within a dataset. Percentile maps highlight areas with high or low values relative to the entire dataset and can be effective for comparing distributions across different regions.

The data preparation as below:

drop_na() excludes records with NA.

NGA_wp <- NGA_wp %>%
  drop_na()

Create customised classification by defining percentile classes where percent is c(0,.01,.1,.5,.9,.99,1).

During variables extraction, the geometry is extracted but base R functions cannot deal with the geometry. To prevent an error from quantile(), st_set_geomtry() is set to NULL to drop geomtry field.

percent <- c(0,.01,.1,.5,.9,.99,1)
var <- NGA_wp["pct_functional"] %>%
  st_set_geometry(NULL)
quantile(var[,1], percent)
       0%        1%       10%       50%       90%       99%      100% 
0.0000000 0.0000000 0.2169811 0.4791667 0.8611111 1.0000000 1.0000000 

We will write an R function get.var to extract a variable (wp_nonfunctional) as a vector out of an sf data.frame.

The arguments are vname for variable name and df as the name of sf data frame It returns v, the vector with values.

get.var <- function(vname,df) {
  v <- df[vname] %>% 
    st_set_geometry(NULL)
  v <- unname(v[,1])
  return(v)
}
percentmap <- function(vnam, df, legtitle=NA, mtitle="Percentile Map"){
  percent <- c(0,.01,.1,.5,.9,.99,1)
  var <- get.var(vnam, df)
  bperc <- quantile(var, percent)
  tm_shape(df) +
  tm_polygons() +
  tm_shape(df) +
     tm_fill(vnam,
             title=legtitle,
             breaks=bperc,
             palette="Blues",
          labels=c("< 1%", "1% - 10%", "10% - 50%", "50% - 90%", "90% - 99%", "> 99%"))  +
  tm_borders() +
  tm_layout(main.title = mtitle, 
            title.position = c("right","bottom"))
}

percentmap("total_wp", NGA_wp)

15. Boxmaps

Boxmaps, also known as box plot maps, are a spatial representation of box plots. They display summary statistics of a variable (e.g., median, quartiles, outliers) for different geographic areas. Each area is represented by a box plot, with the length of the box indicating the interquartile range (IQR) and the median value shown as a horizontal line within the box. Boxmaps provide a compact visualization of spatial variability and distribution of data.

The data preparation as below:

boxbreak is an R function that creating break points for a box map. The arguments v indicates vector with observations, multi is the multiplier for IQR (default 1.5). The function returns bb, a vector with 7 break points compute quartile and fences.

boxbreaks <- function(v,mult=1.5) {
  qv <- unname(quantile(v))
  iqr <- qv[4] - qv[2]
  upfence <- qv[4] + mult * iqr
  lofence <- qv[2] - mult * iqr
  # initialize break points vector
  bb <- vector(mode="numeric",length=7)
  # logic for lower and upper fences
  if (lofence < qv[1]) {  # no lower outliers
    bb[1] <- lofence
    bb[2] <- floor(qv[1])
  } else {
    bb[2] <- lofence
    bb[1] <- qv[1]
  }
  if (upfence > qv[5]) { # no upper outliers
    bb[7] <- upfence
    bb[6] <- ceiling(qv[5])
  } else {
    bb[6] <- upfence
    bb[7] <- qv[5]
  }
  bb[3:5] <- qv[2:4]
  return(bb)
}

get.var is a function to extract a variable as a vector out of an sf data frame. The arguments vname is the variable name and df is the name of sf data frame. It returns v, a vector with values.

get.var <- function(vname,df) {
  v <- df[vname] %>% st_set_geometry(NULL)
  v <- unname(v[,1])
  return(v)
}
var <- get.var("wp_nonfunctional", NGA_wp) 
boxbreaks(var)
[1] -56.5   0.0  14.0  34.0  61.0 131.5 278.0
boxmap <- function(vnam, df, legtitle=NA,
                   mtitle="Box Map", mult=1.5){
  var <- get.var(vnam,df)
  bb <- boxbreaks(var)
  
  tm_shape(df) +
    tm_polygons() +
  tm_shape(df) +
     tm_fill(vnam,title=legtitle,
             breaks=bb,
             palette="Blues",
             labels = c("lower outlier", "< 25%", "25% - 50%", "50% - 75%", "> 75%", "upper outlier")) +
  tm_borders() +
  tm_layout(main.title = mtitle, 
            title.position = c("left", "top"))
}

tmap_mode("plot")
boxmap("wp_nonfunctional", NGA_wp)

16. Isohyet Map

An isohyet map is a type of thematic map that represents lines of equal precipitation on a geographic area. These lines connect points of equal rainfall or precipitation amounts over a given period of time, such as a year or a month. Isohyet maps are commonly used in meteorology, climatology, and hydrology to visualize and analyze spatial variations in precipitation across a region.

The term “isohyet” is derived from Greek roots “iso,” meaning equal, and “hyetos,” meaning rainfall or precipitation. Isohyet maps typically display isohyetal lines, also known as isohyets, which connect areas with the same precipitation values. The spacing between isohyetal lines represents the amount of precipitation change between adjacent lines, with closer spacing indicating steeper gradients in precipitation levels.

16.1 Import Packages

viridis is a colour library. terra create grid (also known as raster) objects as the input and output of spatial interpolation. gstat performs spatial interpolation (geostatistics) to convert point to gradient; useful for spatial and spatio-temporal geostatistical modelling, prediction and simulation. automap performs automatic variogram modelling and kriging interpolation.

pacman::p_load(sf, terra, gstat, tmap, viridis, tidyverse, automap)

16.2 Import Data

rfstations provides location information of existing rainfall stations in Singapore. The data is downloaded from Meteological Service Singapore.

rfstations <- read.csv("data/aspatial/RainfallStation.csv")
glimpse(rfstations)
Rows: 63
Columns: 3
$ Station   <chr> "Admiralty", "Admiralty (West)", "Ang Mo Kio", "Boon Lay (Ea…
$ Latitude  <dbl> 1.4439, 1.4582, 1.3764, 1.3302, 1.3275, 1.3087, 1.3837, 1.38…
$ Longitude <dbl> 103.7854, 103.7953, 103.8492, 103.7205, 103.7042, 103.8180, …

rfdata provides weather data are rainfall stations for the month February, 2024. The data is also downloaded from Meteological Service Singapore. The Latitude and Longitude range shows WGS84 coordinate system and is developed by Greenwich Meridian.

read_csv() of readr package is used to import DAILYDATA_202402.csv. The output object rfdata is in tibble data.frame format.

rfdata <- read_csv("data/aspatial/DAILYDATA_202402.csv")
glimpse(rfdata)
Rows: 1,247
Columns: 13
$ Station                         <chr> "Paya Lebar", "Paya Lebar", "Paya Leba…
$ Year                            <dbl> 2024, 2024, 2024, 2024, 2024, 2024, 20…
$ Month                           <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
$ Day                             <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,…
$ `Daily Rainfall Total (mm)`     <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 42.4, 0.0, 0.…
$ `Highest 30 min Rainfall (mm)`  <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 35.2, 0.0, 0.…
$ `Highest 60 min Rainfall (mm)`  <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 40.4, 0.0, 0.…
$ `Highest 120 min Rainfall (mm)` <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 42.4, 0.0, 0.…
$ `Mean Temperature (°C)`         <chr> "28.7", "28.8", "29.0", "29.1", "29.0"…
$ `Maximum Temperature (°C)`      <chr> "33.1", "33.5", "34.6", "34.0", "34.4"…
$ `Minimum Temperature (°C)`      <chr> "26.0", "25.9", "26.0", "25.1", "25.5"…
$ `Mean Wind Speed (km/h)`        <chr> "11.2", "10.0", "10.3", "8.6", "11.0",…
$ `Max Wind Speed (km/h)`         <chr> "35.2", "35.2", "29.6", "25.9", "37.0"…

mpsz2019 contains planning subzone boundary of URA Master Plan 2019. It is downloaded from data.gov.sg. The original data is in kml format.

mpsz2019 <- st_read(dsn = "data/geospatial",
                    layer = "MPSZ-2019") %>% 
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `/Users/chockwankee/Documents/chockwk/ISSS608_Visual Analytics/Topics/HO10/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
mpsz2019
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...

16.3 Prepare Data

Aggregate rainfall at month level (variability over empty value)

select() of dplyr package is used to retain column 1 and 5 of the input data. group_by() and summarise() of dplyr package are used to compute the total monthly rainfall from Daily Rainfall Total (mm) field. The output is stored in a new field called MONTHSUM.

rfdata <- rfdata %>% 
  select(c(1,5)) %>% 
  group_by(Station) %>% 
  summarise(MONTHSUM = sum(`Daily Rainfall Total (mm)`)) %>% 
  ungroup()
glimpse(rfdata)
Rows: 43
Columns: 2
$ Station  <chr> "Admiralty", "Ang Mo Kio", "Botanic Garden", "Bukit Panjang",…
$ MONTHSUM <dbl> 153.8, 150.0, 173.6, 173.6, 162.2, 130.8, 60.2, 187.8, 207.8,…

rfdata (43 rows) is the reference data for left_join with rfstations (63 rows) and joining with default by = join_by(Station). left_join() of dplyr is used to join rfstations to rfdata.

rfdata <- rfdata %>% 
  left_join(rfstations)
glimpse(rfdata)
Rows: 43
Columns: 4
$ Station   <chr> "Admiralty", "Ang Mo Kio", "Botanic Garden", "Bukit Panjang"…
$ MONTHSUM  <dbl> 153.8, 150.0, 173.6, 173.6, 162.2, 130.8, 60.2, 187.8, 207.8…
$ Latitude  <dbl> 1.4439, 1.3764, 1.3087, 1.3824, 1.3191, 1.2841, 1.3678, 1.38…
$ Longitude <dbl> 103.7854, 103.8492, 103.8180, 103.7603, 103.8191, 103.7886, …

st_as_sf() of sf package is used to convert rfdata into a simple feature data.frame object called rfdata_sf. st_transform of sf package converts the position (degrees) to geometry (metres); transforming the source data from wgs84 to svy21 projected coordinates system. SVY21 is the official projected coordinates of Singapore and the EPSG code of SVY21 is 3414.

For coords argument, it is important to map the X (i.e. Longitude) first, then follow by the Y (i.e. Latitude). crs = 4326 indicates that the source data is in wgs84 coordinates system.

rfdata_sf <- st_as_sf(rfdata, coords = c("Longitude", "Latitude"), # xaxis then yaxis
                      crs = 4326) %>% 
  st_transform(crs = 3414)

16.4 Spatial Interpolation

In order to prepare an isohyet map, spatial interpolation will be used. Spatial interpolation is the process of using points with known values to estimate values at other unknown points. For example, to make a rainfall above, we will not find enough evenly spread weather stations to cover the entire region. Spatial interpolation can estimate the rainfall at locations without recorded data by using known rainfall readings at nearby weather stations (see figure_temperature_map). This type of interpolated surface is often called a geostatistical surface. Elevation data, temperature, property prices, air quality index and population density are other types of data that can be computed using interpolation.

There are many interpolation methods. In this hands-on exercise, two widely used spatial interpolation methods called Inverse Distance Weighting (IDW) and kriging will be introduce. If you are looking for additional interpolation methods, please refer to the ‘Further Reading’ section at the end of this topic.

16.4.1 Rainfall Distribution

tmap functions are used to create a quantitative dot map of rainfall distribution by rainfall station in Singaspore. tmap_options(check.and.fix = TRUE) bypass topological errors. tm_borders() provides subzone borders with no fill. tm_polygons() is a combination of tm_borders() and tm_fill().

tmap_options(check.and.fix = TRUE)

tmap_mode("view")

tm_shape(mpsz2019)+
  tm_borders()+
tm_shape(rfdata_sf)+
  tm_dots(col = "MONTHSUM")
# tmap_mode("plot")

16.4.2 Create gstat object

In order to perform spatial interpolation by using gstat, we first need to create an object of class called gstat, using a function also called gstat. A gstat object contains all necessary information to conduct spatial interpolation, namely:

  • Model definition
  • Calibration data

Based on its arguments, the gstat function “understands” what type of interpolation model:

  • No variogram model → IDW
  • Variogram model, no covariates → Ordinary Kriging
  • Variogram model, with covariates → Universal Kriging

The complete decision tree of gstat, including several additional methods which we are not going to use, is shown in the figure below.

rast() of terra package create a grid data object. read the bounding box values; take the difference of xmax and xmin, and ymax and ymin.

Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33

If raster is 50m or 100m, divide by the difference.

grid <- terra::rast(mpsz2019, nrows = 690, ncols = 1075)
grid
class       : SpatRaster 
dimensions  : 690, 1075, 1  (nrow, ncol, nlyr)
resolution  : 49.98037, 50.01103  (x, y)
extent      : 2667.538, 56396.44, 15748.72, 50256.33  (xmin, xmax, ymin, ymax)
coord. ref. : SVY21 / Singapore TM (EPSG:3414) 

A list xy will be created by using xyFromCell() of terra package. xyFromCell() gets coordinates of the center of raster cells for a row, column, or cell number of a SpatRaster. Or get row, column, or cell numbers from coordinates or from each other.

xy <- terra::xyFromCell(grid, 1:ncell(grid))
head(xy)
            x        y
[1,] 2692.528 50231.33
[2,] 2742.509 50231.33
[3,] 2792.489 50231.33
[4,] 2842.469 50231.33
[5,] 2892.450 50231.33
[6,] 2942.430 50231.33

16.4.3 Create sf object

coop <- st_as_sf(as.data.frame(xy), 
                 coords = c("x", "y"),
                 crs = st_crs(mpsz2019))
coop <- st_filter(coop, mpsz2019)
head(coop)
Simple feature collection with 6 features and 0 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 25883.42 ymin: 50231.33 xmax: 26133.32 ymax: 50231.33
Projected CRS: SVY21 / Singapore TM
                   geometry
1 POINT (25883.42 50231.33)
2  POINT (25933.4 50231.33)
3 POINT (25983.38 50231.33)
4 POINT (26033.36 50231.33)
5 POINT (26083.34 50231.33)
6 POINT (26133.32 50231.33)

16.5 Inverse Distance Weighted (IDW)

In the IDW interpolation method, the sample points are weighted during interpolation such that the influence of one point relative to another declines with distance from the unknown point you want to create.

Weighting is assigned to sample points through the use of a weighting coefficient that controls how the weighting influence will drop off as the distance from new point increases. The greater the weighting coefficient, the less the effect points will have if they are far from the unknown point during the interpolation process. As the coefficient increases, the value of the unknown point approaches the value of the nearest observational point.

It is important to notice that the IDW interpolation method also has some disadvantages: the quality of the interpolation result can decrease, if the distribution of sample data points is uneven. Furthermore, maximum and minimum values in the interpolated surface can only occur at sample data points. This often results in small peaks and pits around the sample data points.

We are going to use three parameters of the gstat function: formula:

g = gstat(formula = annual ~ 1, data = rainfall)

  • The prediction “formula” specifying the dependent and the independent variables (covariates) data
  • The calibration data model
  • The variogram model

A formula object is created using the ~ operator, which separates names of dependent variables (to the left of the ~ symbol) and independent variables (to the right of the ~ symbol). Writing 1 to the right of the ~ symbol, as in ~ 1, means that there are no independent variables38.

Calculate inverse weighted nmax where 15 neighbours are considered.

res <- gstat(formula = MONTHSUM ~ 1, 
             locations = rfdata_sf, 
             nmax = 5,
             set = list(idp = 0))

Calculate the predicted values using interpolation

After defining the model, predict() is used to interpolate, i.e., to calculate predicted values.

The predict function accepts:

  • A raster—stars object, such as dem
  • A model—gstat object, such as g

The raster serves for two purposes:

  • Specifying the locations where we want to make predictions (in all methods)
  • Specifying covariate values (in Universal Kriging only)
resp <- predict(res, coop)
[inverse distance weighted interpolation]
glimpse(resp)
Rows: 314,019
Columns: 3
$ var1.pred <dbl> 144.14, 144.14, 144.14, 144.14, 144.14, 144.14, 144.14, 144.…
$ var1.var  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ geometry  <POINT [m]> POINT (25883.42 50231.33), POINT (25933.4 50231.33), P…
resp$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred

pred <- terra::rasterize(resp, grid, 
                         field = "pred", 
                         fun = "mean")
tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(pred) + 
  tm_raster(alpha = 0.6, 
            palette = "viridis")

16.6 Kriging

Kriging is one of several methods that use a limited set of sampled data points to estimate the value of a variable over a continuous spatial field. An example of a value that varies across a random spatial field might be total monthly rainfall over Singapore. It differs from Inverse Distance Weighted Interpolation discussed earlier in that it uses the spatial correlation between sampled points to interpolate the values in the spatial field: the interpolation is based on the spatial arrangement of the empirical observations, rather than on a presumed model of spatial distribution. Kriging also generates estimates of the uncertainty surrounding each interpolated value.

In a general sense, the kriging weights are calculated such that points nearby to the location of interest are given more weight than those farther away. Clustering of points is also taken into account, so that clusters of points are weighted less heavily (in effect, they contain less information than single points). This helps to reduce bias in the predictions.

The kriging predictor is an “optimal linear predictor” and an exact interpolator, meaning that each interpolated value is calculated to minimize the prediction error for that point. The value that is generated from the kriging process for any actually sampled location will be equal to the observed value at this point, and all the interpolated values will be the Best Linear Unbiased Predictors (BLUPs).

Kriging will in general not be more effective than simpler methods of interpolation if there is little spatial autocorrelation among the sampled data points (that is, if the values do not co-vary in space). If there is at least moderate spatial autocorrelation, however, kriging can be a helpful method to preserve spatial variability that would be lost using a simpler method (for an example, see Auchincloss 2007, below).

Kriging can be understood as a two-step process:

first, the spatial covariance structure of the sampled points is determined by fitting a variogram; and second, weights derived from this covariance structure are used to interpolate values for unsampled points or blocks across the spatial field. Kriging methods require a variogram model. A variogram (sometimes called a “semivariogram”) is a visual depiction of the covariance exhibited between each pair of points in the sampled data. For each pair of points in the sampled data, the gamma-value or “semi-variance” (a measure of the half mean-squared difference between their values) is plotted against the distance, or “lag”, between them. The “experimental” variogram is the plot of observed values, while the “theoretical” or “model” variogram is the distributional model that best fits the data.

Firstly, we will calculate and examine the empirical variogram by using variogram() of gstat package. The function requires two arguments:

formula, the dependent variable and the covariates (same as in gstat, see Section 12.2.1) data, a point layer with the dependent variable and covariates as attributes as shown in the code chunk below.

v <- variogram(MONTHSUM ~ 1, 
               data = rfdata_sf)
plot(v)

We can then compare the plot with the theoretical models below.

With reference to the comparison above, am empirical variogram model will be fitted by using fit.variogram() of gstat package as shown in the code chunk below.

Plot statistics and fit into a model, followed by interpolation Based on model where selection is dependent on psill, model, range, and nugget range 900m or 5000m is the search radius.

fv <- fit.variogram(object = v, model = vgm(psill = 0.5, model = "Sph",
                                            range = 5000, nugget = 0.1))
fv
  model     psill    range
1   Nug 0.1129190    0.000
2   Sph 0.5292397 5213.396

We can visualise how well the observed data fit the model by plotting fv using the code chunk below.

plot(v, fv)

The plot above reveals that the empirical model fits rather well. In view of this, we will go ahead to perform spatial interpolation by using the newly derived model as shown in the code chunk below.

k <- gstat(formula = MONTHSUM ~ 1, 
           data = rfdata_sf, 
           model = fv)
k
data:
var1 : formula = MONTHSUM`~`1 ; data dim = 43 x 2
variograms:
        model     psill    range
var1[1]   Nug 0.1129190    0.000
var1[2]   Sph 0.5292397 5213.396

predict() of gstat package will be used to estimate the unknown grids by using the code chunk below. resp is a sf tibble data.frame with point features.

resp <- predict(k, coop)
[using ordinary kriging]
resp$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred
resp$pred <- resp$pred
resp
Simple feature collection with 314019 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2692.528 ymin: 15773.73 xmax: 56371.45 ymax: 50231.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
   var1.pred  var1.var                  geometry        x        y     pred
1   131.0667 0.6608399 POINT (25883.42 50231.33) 25883.42 50231.33 131.0667
2   130.9986 0.6610337  POINT (25933.4 50231.33) 25933.40 50231.33 130.9986
3   130.9330 0.6612129 POINT (25983.38 50231.33) 25983.38 50231.33 130.9330
4   130.8698 0.6613782 POINT (26033.36 50231.33) 26033.36 50231.33 130.8698
5   130.8092 0.6615303 POINT (26083.34 50231.33) 26083.34 50231.33 130.8092
6   130.7514 0.6616697 POINT (26133.32 50231.33) 26133.32 50231.33 130.7514
7   130.6965 0.6617971  POINT (26183.3 50231.33) 26183.30 50231.33 130.6965
8   130.6446 0.6619131 POINT (26233.28 50231.33) 26233.28 50231.33 130.6446
9   130.5958 0.6620184 POINT (26283.26 50231.33) 26283.26 50231.33 130.5958
10  132.5484 0.6542154 POINT (25033.76 50181.32) 25033.76 50181.32 132.5484
kpred <- terra::rasterize(resp, grid, 
                         field = "pred")
kpred
class       : SpatRaster 
dimensions  : 690, 1075, 1  (nrow, ncol, nlyr)
resolution  : 49.98037, 50.01103  (x, y)
extent      : 2667.538, 56396.44, 15748.72, 50256.33  (xmin, xmax, ymin, ymax)
coord. ref. : SVY21 / Singapore TM (EPSG:3414) 
source(s)   : memory
name        :      last 
min value   :  72.77826 
max value   : 195.53284 

The output object kpred is in SpatRaster object class with a spatial resolution of 50m x 50m. It consists of 1075 columns and 690 rows and in SVY21 projected coordinates system.

16.7 Mapping the interpolated rainfall raster

tmap functions are used to map the interpolated rainfall raster (i.e. kpred) by using the code chunk below.

tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(kpred) + 
  tm_raster(alpha = 0.6, 
            palette = "viridis",
            title = "Total monthly rainfall (mm)") +
  tm_layout(main.title = "Distribution of monthly rainfall, Feb 2024",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar() +
  tm_grid(alpha =0.2)

16.8 Automatic Variogram Modelling

autofirVariogram() of automap package can be used to perform varigram modelling as shown in the code chunk below.

v_auto <- autofitVariogram(MONTHSUM ~ 1, 
                           rfdata_sf)
plot(v_auto)

v_auto
$exp_var
   np      dist     gamma dir.hor dir.ver   id
1  15  1957.436  311.9613       0       0 var1
2  33  3307.349  707.7685       0       0 var1
3  54  4861.368  848.1314       0       0 var1
4 116  6716.531  730.3969       0       0 var1
5 111  9235.708 1006.5381       0       0 var1
6 120 11730.199 1167.5988       0       0 var1
7 135 14384.636 1533.5903       0       0 var1

$var_model
  model    psill   range kappa
1   Nug     0.00       0   0.0
2   Ste 24100.71 1647955   0.3

$sserr
[1] 0.2178294

attr(,"class")
[1] "autofitVariogram" "list"            
k <- gstat(formula = MONTHSUM ~ 1, 
           model = v_auto$var_model,
           data = rfdata_sf)
k
data:
var1 : formula = MONTHSUM`~`1 ; data dim = 43 x 2
variograms:
        model    psill   range kappa
var1[1]   Nug     0.00       0   0.0
var1[2]   Ste 24100.71 1647955   0.3
resp <- predict(k, coop)
[using ordinary kriging]
resp$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred
resp$pred <- resp$pred

kpred <- terra::rasterize(resp, grid, 
                         field = "pred")
tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(kpred) + 
  tm_raster(alpha = 0.6, 
            palette = "viridis",
            title = "Total monthly rainfall (mm)") +
  tm_layout(main.title = "Distribution of monthly rainfall, Feb 2024",
            main.title.position = "center",
            main.title.size = 1.2,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_compass(type="8star", size = 2) +
  tm_scale_bar() +
  tm_grid(alpha =0.2)

References:

https://isss608-vaa-demo.netlify.app/in-class_ex/in-class_ex07/in-class_ex07-isomap

Olea, Ricardo A. (2006-07) “A six-step practical approach to semivariogram modeling”, Stochastic Environmental Research and Risk Assessment, 2006-07, Vol.20 (5), p.307-318. SMU e-journal.